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Abstract 

Anisotropic decompositions using representation systems such as curvelets, con- 
tourlet, or shcarlets have recently attracted significantly increased attention due to 
the fact that they were shown to provide optimally sparse approximations of functions 
exhibiting singularities on lower dimensional embedded manifolds. The literature now 
contains various direct proofs of this fact and of related sparse approximation results. 
However, it seems quite cumbersome to prove such a canon of results for each system 
separately, while many of the systems exhibit certain similarities. 

In this paper, with the introduction of the concept of sparsity equivalence, we aim to 
provide a framework which allows categorization of the ability for sparse approximations 
of representation systems. This framework, in particular, enables transferring results 
on sparse approximations from one system to another. We demonstrate this concept 
for the example of curvelets and shcarlets, and discuss how this viewpoint immediately 
leads to novel results for both systems. 
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1 Introduction 



Recently, a paradigm shift could be observed in applied mathematics, computer science, 
and electrical engineering. The novel paradigm of sparse approximations now enables not 
only highly efficient encoding of functions and signals, but also provides intriguing new 
methodologies, for instance, for recovery of missing data or separation of morphologically 
distinct components. At about the same time, scientists began to question whether wavelets 
are indeed perfectly suited for image processing tasks, the main reason being that images are 
governed by edges while wavelets are isotropic objects. This mismatch becomes also evident 
when recalling that Besov spaces can be characterized by the decay of wavelet coefficient 
sequences however Besov models are clearly deficient to adequate capturing of edges. 

These two fundamental observations have led to the research area of geometric multi- 
scale analysis whose main goal is to develop representation systems, preferably containing 
different scales, which are sensitive to anisotropic features in functions/signals and pro- 
vide sparse approximations of those. Such representation systems shall for now be loosely 
coined anisotropic systems. Let us state as a few samples on the long list the directional 
filter banks [2], directional wavelets [1], ridgelets [6], complex wavelets [17], (first and sec- 
ond generation) curvelets [9, 10, 11], contourlets [12], bandlets [25], and shearlets [15, 20]. 
Browsing through the literature, it becomes evident that sparse approximation properties 
are quite similar for some systems such as curvelets and shearlets, whereas other systems 
such as ridgelets show a different behavior. Delving more into the literature we observe that 
for those systems exhibiting similar sparsity behavior many results were proven with quite 
resembling proofs. One might ask: Is this cumbersome close repetition of proofs really nec- 
essary? We believe that the answer is no and that a formalization of sparse approximation 
properties of anisotropic systems solves this problem. 

The main goal of this paper is to proclaim the concept of sparsity equivalence for 
anisotropic systems leading to equivalence classes for sparsity properties, and thereby aiming 
for the aforementioned formalization of sparse approximation properties. Our theoretical 
considerations are anticipated to have the following impacts: 

• A thorough understanding of the ingredients of anisotropic systems which are crucial 
for an observed sparse approximation property, thereby also categorizing different 
sparsity behaviors. 

• A framework within which sparsity results can be directly transferred from one system 
to others. 

• A quality measure for new anisotropic systems which they have to pass to be consid- 
ered eligible for a particular sparsity analysis. 

1.1 The Concept of Sparsity Equivalence of Frame Expansions 

Frame expansions are extensively utilized in applied mathematics, computer science, and 
electrical engineering if non- uniqueness, yet stability is required, and might be regarded 
as a natural generalization of the concept of an orthonormal basis. Non-uniqueness of an 
expansion is customarily exploited for deriving resilience against erasures or quantization. 
However, lately the flexibility of such non-unique expansions has been shown to lead to 
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optimally sparse approximations of particular model classes of functions, where sparsity of 
a coefficient sequence (ci)i£i is ideally measured in the || • ||o-norm counting the number 
of non-zero entries. The fundamental fact that this measure can be approximated by the 
II • Ill-norm as the closest convex norm has initiated and led to a deluge of results in the 
area of sparse approximations and recovery; see the survey paper [4]. 

Before continuing, let us briefly illustrate the precise relation of this sparsity measure 
with sparse approximation properties. Given a tight frame {{pi)i^j for a Hilbert space Ti, say, 
and let /C C 7^ be a class whose elements we desire to sparsely approximate. Approximation 
theory then paves the way to measure the ability of {ipi)i^i for sparse approximations of ele- 
ments of /C, and typically the decay of the squared error of the 'best' n-term approximation, 
i.e., the behavior of 

\\f - ^i{fiVi))in)^if asiV^cx), (1) 

n>N 

where ((/, V^i))(n) is the n-th largest coefficient, is analyzed. Intriguingly, in the case of 
a redundant system, it is not clear whether this is indeed the best n-term approximation; 
nevertheless it is customarily exploited as a suitable substitute in lack of a more accurate 
and still conveniently applicable selection rule. The term in (1) can now be estimated by 

II/- E(^/''^^))W'^^II' ^C'- E |((/,<p,,))(„)|2. (2) 

n>N n>N 

Then the relation to ||((/, V^i))i||p (0 < p < 1) is established by observing that ||((/, V5i))i||p < 
C implies that the number of coefficients (/, (pi) exceeding 1/n is bounded by Cn^^^, thus 
the magnitude of the n-th largest coefficient ((/, <^j))(n) is not bigger than C"n~^/^. 

As we already elaborated upon before, there do exist frames which show very similar 
sparse approximation properties. Aiming towards a categorization of sparsity properties, 
we immediately observe that the well-exploited unitary equivalence of frames does not 
serve our purposes here; the reason being that \\{{f ,Uipi))i\\p = \\{{U~^ f ,ipi))i\\p for all 
f £ IC, however the class /C does not need to be invariant under the unitary operator U~^. 
Evidently, the equivalence relation we truly aim for is as follows: 

Definition 1.1 Let {ipi)i^j and be two frames for a Hilbert space %, let fC be a 

subset ofH, and let < p < 1. Then and {'il^j)j<zj are sparsity equivalent in ip with 

respect to/C, if, for each f € fC, we have \\{{f,^i))i\\p < oo if and only if \\i{f,'4'j))j\\p < oo. 

This property is in fact a property of the cross- Grammian matrix {{^pi,ipj))i^j, more 
precisely, of diagonal dominance of this matrix. A suitable norm for measuring the decay 
of this matrix away from the diagonal was introduced in [11], and is defined as follows: For 
p G (0, 1], the II • ||op,p-norm of a matrix M = {Tnij)ij is given by 




This norm indeed measures whether sparsity equivalence is present, and we obtain the 
following result. Notice however, that the condition on the cross- Grammian matrix is by 
far not necessary, which can be seen by the fact that it implies sparsity equivalent in Ip 
with respect to any subset IC. 
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Lemma 1.1 Let ((^i)jg7 and be two tight frames for a Hilbert space %, let JC be a 

subset of v., and let < p < I. If \\i{^i,4'j))i,j\\op,p is finite, then {ipi)i^j and {ipj)j£j are 
sparsity equivalent in Ip with respect to K,. 

Proof. Let f £ C, and assume that ||((/, '/3j))j||p < oo. From \\{{^i,tpj))ij\\op,p < oo it 
follows that 

sup I ((/9j, 'i/'j) 1^ < oo and sup | |^ < oo. (3) 

Using the fact that {(pi)i is a tight frame, 

ii((/,V'.».ii^ = ii((E^/''/'^)v'-V',»,ii^ = ii(E(/''/'^) i^^^^3)Wp- 

i i 

Now, since p < 1, 

i j i i ^ j 

which is finite by (3) and due to the fact that ||((/, ^i))i\\p < oo. 

For symmetry reasons, the implication ||((/, < oo =^ ||((/, '/?j))'t||p < oo can be 

derived similarly. The lemma is proved. □ 

We will now demonstrate this concept for the pair of curvelets and shear lets, which are 
two prominent examples of anisotropic systems even sharing parabolic scaling as the main 
anisotropic force. The intuition that they should be sparsity equivalent is substantiated by 
comparing results on sparse approximation properties of curvelets and shearlets. And, in 
fact, the result derived in Subsection 1.3 shows this to be true. Before stating the result, 
we first need to introduce those two systems. 

1.2 Curvelets and Shearlets 

We now recall the definitions of curvelets - focussing on second generation curvelets - and 
shearlets. Those two systems will be exemplarily focused on in our demonstration of the 
framework of sparsity equivalence. 

1.2.1 Curvelets 

The main motivation for the introduction of curvelets came from the observation that - by 
taking a computer vision point of view - edges are those features governing an image while 
separating smooth regions. A first model for this view point was introduced in [13] and 
coined a 'cartoon-like model'. This model then in fact revealed the suboptimal treatment 
of edges by the at that time seemingly superior system of wavelets. 

The introduction of (first generation) tight curvelet frames in 2004 by Candes and 
Donoho [9], which provably provide (almost) optimally sparse approximations within such 
a cartoon-like model might be considered a milestone in applied harmonic analysis. Later, 
second generation curvelets were introduced in [11] due to a more satisfactory associated 
system with continuous parameters [10], and were shown to provide optimally sparse de- 
compositions of Fourier Integral Operators [5]. 
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To present the definition of these second generation curvelets - from now on also called 
curvelets in contrast to first generation curvelets -, let W be the Fourier transform of a 
one-dimensional wavelet and V he a, 'bump function' in Fourier space. We select both 
functions to be band-limited, where suppW C [—2,-1/2] U [1/2,2] and supp^ C [—1,1], 
and to satisfy W,V £ C°°. Curvelets live on anisotropic regions of width and length 
2"'?/^ at various orientations, which are parameterized by angle. For our purposes, it is 
sufficient to ignore the low frequency part in curvelet decompositions as discussed latter. 
We just mention that appropriate low frequency functions can be added to the curvelet 
system defined below to force it to become a tight frame for L^(]R^). Hence we will only 
state the definition of curvelets restricted to 

C = {e G : ll^ll^ > 1}. 

Let now Aa denote the parabolic scaling matrix Aa = diag(a,-ya). Curvelets at scale 
j > 0, orientation i = 0,... ,2-'/^ — 1, and spatial position m = (mi, 7712) G are then 
defined by their Fourier transforms of some ^ G M^, with {r,u) denoting the associated 
polar coordinates, 

%{0 = 2-^1 • W{r/2nV{{co - e,,,)2^-/2) . ^i{Rs^,A^-,m,0 ^ 

where here 6j^e = 2ir£/2^/'^, Rq is planar rotation by —6 radians, and we let = {j,£,m) 
index scale, orientation, and position. We refer to [11, Sect. 4.3, pp. 210-211] for more 
details, and to Figure 1 for an illustration of the induced tiling of the frequency plane. 




Figure 1: The tiling of the frequency domain induced by curvelets. 
1.2.2 Shear lets 

In 2006, a novel directional representation system - so-called shearlets - has been proposed 
in [15, 20], which provides a unified treatment for the continuum and digital world. The 
main point in comparison with curvelets is the fact that angles are replaced by slopes when 
parameterizing directions which greatly supports the treating of the digital setting. Hence 
the theory of shearlets allows an associated digital theory which can be directly implemented 
[23]. 
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In a similar way as curvelets do shearlets live on anisotropic regions of width 2 ^ and 
length 2^-'/2 at various orientations, which are now parameterized by slope rather than 
angle as for curvelets. Similar to the definition of curvelets stated in Subsection 1.2.1, also 
here we will ignore the low frequency part, and just mention that it can be appropriately 
included to yield a tight frame for Lp'{M?). Let now the Fourier transform W oi a wavelet 
and a bump function V be chosen as in Subsection 1.2.1, and let C^^^ and C^^^ denote the 
following two cones: 



{(ei,6)GM2:|ei|>l, 16/61 <1} 

{(6,6) GK': 161 > 1, 16/61 < 1} 



For cone C^^\ at scale j > 0, orientation k = —[2-'/^] 
m E Z^, the associated shearlets are defined by their Fourier transforms 

ar,iO = 23j^/V(SfcA2, 

where Sk denotes the shear matrix 



, [2-'/^] , and spatial position 



-mj 



1 k 
1 



and 7] = {j, k, m, 1) indexes scale, orientation, position, and cone. We now assume that 
G L^(M^) is chosen such that 

Vi(6,6) = W(6)n6/6), 

wherefore 

a^(^) = 2-^1vF(6/2-'')F(/fc + 2^'/2^2/6)e^<^^^2-."^.0. 

The shearlets for C^^^ are defined likewise by symmetry, as illustrated in Figure 2; this 
initiated the terminology cone-adapted shearlets in contrast to shearlets arising directly 
from a group representation (cf. [19]). 




Figure 2: The tiling of the frequency domain induced by cone- adapted shearlets. 

We remark that the discrete shearlets considered, for instance, in [16] differ slightly 
from this choice, since they are usually associated with a scaling of 4^ . However, it is easily 
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checked - and we refer concerning this issue and additional details to the survey paper [21] - 
that the shearlets as defined here also form a tight frame for L^(M^). 

The attentive reader will have also noticed that we here consider the class of band- limited 
shearlets although there has just recently been introduced a class of compactly supported 
shearlets which have superior spatial domain localization (see [22, 18]). Since in this paper 
we however aim to compare curvelets and shearlets and since curvelets are band-limited, 
the class of band-limited shearlets is the canonical choice. Another issue to consider is the 
fact that compactly supported shearlets are not a tight frame, thereby requiring adaptions 
to the analysis. Additional thoughts on compactly supported versus band-limited shearlets 
can be found in Section 4. 

1.3 Equivalence Result 

The introduction of the concept of sparsity equivalence in Subsection 1.1 now motivates 
us to ask whether curvelets and shearlets belong to the same equivalence class, hence are 
sparsity equivalent. The many quite similar results on sparse approximation properties of 
those two systems seem to indicate this. According to Lemma 1.1, the ip norm of the cross- 
Grammian matrix reveals the true sparsity relation, and we obtain the following result, 
whose lengthy proof is presented in Subsection 2.2. 

Theorem 1.1 For all < p < 1, 

II((o"»?i7m))'7,mIIop,p < 

Now Lemma 1.1 can be applied to derive the already intuitively expected sparsity equiv- 
alence of shearlets and curvelets. 

Theorem 1.2 For allO < p <1, the shearlet frame {<Jn)r} o-nd the curvelet frame (7;^);^ are 
sparsity equivalent in Ip with respect to L^(M^). 

1.4 Impact of Sparsity Equivalence 

The significance of the viewpoint of sparsity equivalence lies in the fact that it not only 
provides a thorough understanding of the ability of different anisotropic systems for sparse 
expansions when compared to each other - thereby providing a qualitative comparison ~, 
but it moreover allows the transfer of sparsity results without repeating quite similar proofs. 

The theorem presented in the previous subsection is a first demonstration of the power 
of such a higher level viewpoint of sparsity behavior. In fact, this result automatically 
leads to novel results on and insights in sparse expansions by curvelets and shearlets. A 
few examples, for which this conceptually new approach is fruitful, will be presented in 
Section 3 including optimally sparse approximations of cartoon-like images and the ability 
for geometric separation of morphologically distinct phenomena. 

1.5 Extensions and General Viewpoint 

As mentioned before. Theorems 1.1 and 1.2 are amenable to generalizations and extensions. 
Previewing Section 4, we briefly discuss a few examples. 
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• Curvelets and Shearlets. A similar statement as Theorem 1.2 should be provable for 
first generation curvelets as also for the new class of compactly supported shearlets. 

• Other Systems. The analysis of sparsity equivalence of curvelets and shearlets we drove 
here can and should be applied to other pairs of systems. Ideally, novelly introduced 
systems could be compared to a system whose sparse approximation properties are 
already very well understood. 

• Systems with Continuous Parameters. Certainly, we can also ask about similar spar- 
sity behavior for systems with continuous parameters. This however requires a differ- 
ent sparsity model; one conceivable path would be to compare their ability to resolve 
wavefront sets. 

• Weighted Norms. When aiming at transferring results such as sparse decompositions 
of curvilinear integrals [7] or sparse decompositions of the Radon transform [8] , some- 
times weighted ip norms might need to be analyzed. This is also essential for analyzing 
associated approximation spaces. 

1.6 Outline 

We start by presenting the analysis of sparsity equivalence between curvelets and shearlets 
and providing the proof of Theorem 1.1. We then analyze the impact of this and related 
results on sparse approximation properties of anisotropic systems in Section 3. In particular, 
we derive novel results on sparse approximation of cartoon-like images using curvelets and 
on the ability of geometric separation using shearlets and wavelets. This section is followed 
by a discussion on extensions of our framework (see Section 4). 

2 Sparsity Equivalence between Curvelets and Shearlets 

In this section our goal is to prove sparsity equivalence in ip of curvelets and shearlets for 
all < p < 1. Due to Lemma 1.1, this task is reduced to proving Theorem 1.1, i.e., showing 
that the || • ||op,p-iiorm of the cross-Grammian matrix of curvelets and shearlets is finite. 

We first realize that for our analysis we only need to consider those curvelets and shear- 
lets which respond to the high-frequency content of a function. More precisely, if we are 
given a function, say / G L^(M^), we might decompose it as f = /l + fn = 91 ■ f + gn ■ f , 
where gi is a low pass filter with (5l)|c = !> and gH is an 'associated' high pass filter 
satisfying gi + gjj = 1- Now notice, that the inner products between elements of both 
frames corresponding to gi are negligible due to their almost orthogonality, since they are 
scaling functions; also the inner products of those elements with elements corresponding to 
gn are of a similar reason negligible. 

This argument shows that it is sufficient to only consider the cross-Grammian matrix 
of the elements of the curvelet and shearlet frame introduced in Subsection 1.2, i.e., those 
analyzing the high-frequency part of a function. 
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2.1 Estimates for the Entries of the Cross-Grammian Matrix 

We start by establishing estimates on the absolute values of inner products of curvelets and 
shear lets. An essential ingredient will be the following well-known result, which we state 
here for the convenience of the reader. A detailed proof might for instance be found in [20, 
Lem. 2.3]. 

Lemma 2.1 Suppose g satisfies g G Cq°(M'^) with g being supported on a fixed bounded 
rectangle R CM.'^. Then, for each iV G N, there exists a constant Cn such that 

|c/(a;)| < CAr(l + |xp)-^ forallxGR'^. 

In particular, Cn = N \{R) (H^Hoo + ||A^^||oo), where A = Yl^^i ^ denotes the frequency 
domain Laplacian operator and X{R) is the Lebesgue measure of R. 

In [11] the following conclusion was drawn from this lemma which we will also require 
for our proof. 

Lemma 2.2 [11, Lem. 5.6] Suppose {fj)j>o 'is a sequence of functions satisfying that each 
fj is supported in a rectangle Rj = A2i{[—Ci,Ci] x [—C2,C2]) and every scaled function 

g,(e) = 2fv,(^.e) 

obeys \\gj\\c'^ ^ Pn for N = 2,4,6,... with each being independent on j. Then, for 
N = 2, 4, 6, . . ., there exist constants Cn such that 

\fjix)\ < Cn{po + Pn){A2,x)-^ for all x E 

where 

{y) = {i + y'f''. 

The estimates which are proved in the following proposition are carefully designed so 
that the previously stated claim concerning the || • ||op^p-norm, < < 1 of the cross- 
Grammian matrix of curvelets and shearlets does follow almost immediately as a corollary. 
We note that a similar estimate for the second cone C^^^ holds with a resembling proof. 

Proposition 2.1 Let j,j > 0, \k\ < [2^/^], < £ < 2^/^ and m,rh € Z. Then, for each 
= 2, 4, 6, . . ., there exist constants Cn so that 

\{crj,k,m,l,Tj,l,rh) \ < '^A'l{li-i|<2}l{fceX^j_jl{teL-_j(l&j,fc,m,j,£,ml)~^ 

where 

K.-.^^ = {k : [-2^'/2 . tan(2-^'/2(;^ ^ 2tt£)) -l\<k< \-2^l'^ ■ tsai{2-^ /'^{-l + 2^i)) + 1]}, 

L. -.^^ = {I : [2^V2 arctan(2-^/2(_i _ _ < ^-nt < \2^l'^ arctan(2-^/2(i _ ^ ^-^^^ 
and 

hk,m3Arh = A2j{SlA2-jm - Re-^^^A^.]m). 
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Proof. To illustrate the different supports in frequency domain of crj,fc,m,i and Tj^^, a 
property which will be exploited in the sequel, we refer to Figure 3. 
We now fix j, k, m. By employing Plancherel's theorem, we have 

= ^^^'^jf^KM) ■ e^<^^^-^-^«..^-3-«) di, (4) 

where 

Due to the support conditions of W and V ^ the support of d'j^k,m,i equals 

suppa,,fc,„,i = {(6,6) G K2 : 6 G [2'-\2^^\ 6/6 G 2~^/\[-l,l] - k)}, (5) 
whereas the support of is 

^^PPTjArh = {(?i'?2) G M2 : r G [2^^"!, 2^^+^], a; G 2-^V2[-l, 1] + J. (6) 
We conclude that ^ ^ ^ = unless \ j — j\ < 2, hence 

\{^j,k,m,l,Tj/^fh)\ — ^j,k,m,j,e,m^{\j-j\<2}- (''') 

Our next task is to estimate the range of £ for which \{crj^k,m,iTlj £ rh)\ non-zero. This 
will be done by showing that this parameter is contained in a compact set whose size is 
uniformly bounded as j,j — t- oo. For this, we will study the slopes of the boundaries of 
the supports of (Tj,fc,m,i and 7j ^ „ angular direction. For better comparison with (5), the 
support (6) might be rewritten as 

SUPP7J,^,™ = {(6,6) G r G \2P-\2P+\ 6/6 G tan(2-^V2([_i^ i] + 2vr^))}. (8) 

Notice that the angle between the two angular boundary lines of the support of curvelets 
does not change with i, whereas in the shearlet case the angle becomes smaller as the 
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support of the Fourier transform of the shearlet approaches the angle bisector of the first 
quadrant. From (5) and (8), it follows that fjjke = ^^^ 

tan(2-j'/2(i ^ 27r^)) < 2~^^'^{-l - k) or tan(2-j'/2(-l + 27r^)) > 2-^/^{l - k). 
Continuing (7), this implies 

with K- -■ n and L • ^ . as defined in the statement of the lemma. 

Next we aim to estimate the decay in m and m by making use of Lemma 2.2. To prepare 
the application of this lemma, we rescale the function fjjk£ in the term of the RHS of (4) 
according to 

■ 3 ^ 

This yields a function which can be decomposed into factors in the following way: 

9j;j,kA^'^^ = ^oj{u,v)Wij{u,v)Voj{u,v)Vij{u,v). 

All factors belong to C°°, and it can be checked that their derivatives are bounded inde- 
pendent on j (for a similar argument confirm [11, Subsec. 5.2]). This allows us to apply 
Lemma 2.2 to obtain 

\f,,lk,M<CN{\A,,b\)-^, Ar = 2,4,6,.... 
From this we conclude that, for = 2, 4, 6, . . ., 



Combining this estimate with the estimates from (7) and (9) proves the lemma. □ 

2.2 Proof of Theorem 1.1 

Let < p < 1. We start by proving that 

sup J]] < (10) 



-pN 



Setting r] = {j, k, m, 1) and fi = {j, i, rh), by Proposition 2.1, 

SUp^|(K,7M))r;,/.r < Cn,p sup Yl Y^^^jAmjArhiy 

^ ^'^'"'{\j-]\<2}{k€K^j,} m 

< C^,pSUp Yl T.^\hk,mJArn\)-'"'. (H) 

The last estimate was derived by observing that the maximum of J2mi\^j,k,m,j,£,fh\)~^^ is 
attained if j = j. 
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We next compute the number of integers k satisfying \k\ < [2-^/^] which are contained 
in Kjj^£. We observe that ij^{Kj^j^i) is maximal if ^ is chosen so that the upper bound of the 
curvelet coincides with the angle bisector, the reason being that the support in frequency 
domain of this 'corner curvelet' has a maximal number of intersections with frequency 
supports of shear lets. In fact, the angular support of the Fourier Transform of shear lets 
become smaller when the angle increases, hence more shearlets are needed to overlap the 
angular frequency support of a curvelet, which does not change its size with varying angle 
(also compare the proof of Proposition 2.1). Hence, using 

SUPP7,V,™, = {(6,6) G : r G [2^-\2^+\ u G 2"^>[-l, 1] + 0.^,} 
(cf. (6)), it is sufficient to restrict to the situation 

1 = 2-^1^ + 9,,. 

By definition of Oj^i, we therefore obtain the condition 

^= (2^)-i(2J'/2^/4- 1). 

The definition of Kjj^i implies 

#{Kjj^e) < -2J'/2tan(2-J'/2(_i + 2^/\/4 _ i)) _ (-2J'/2 tan(2-J'/2(i + 2^/^/4. _ i))) + 3 
< 2J'/2(i _ tan(7r/4 - 2 • 2-^/^)) + 3. 

Now 

2J'/2(l - tan(7r/4 - 2 • 2~^^^)) ^4, j ^ 00, 

hence, 

-^7, j 00. 

From (11), we can then conclude that 

sup 5] i((a„7^)),,^r < c;,^^ sup Y.i\hk,m,,iM)-'"'. (12) 

Next we aim to prove that 

m m 

The second inequality follows easily from the facts that (|m|)~^ < 2~^(mi)~^(m2)~^ and 
choosing pN large enough such that (^n-i)"^^ < 00 for i = 1,2. Concerning the first 
inequality in (13), recall that 

bj,k,m,j,i,m = A2j{SlA2~jm - R0^^A2-jrh). 

Since we sum over m, WLOG we can assume that m = 0. We have 

\A2]SlA2-jm\ = |(mi,2"-^'/^/c?ni +m2)|. 
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and hence it follows immediately that 

m m 

This completes the proof of (13). 

Finally, (10) follows from the application of (13) to (12) and an estimate similar to 
Proposition 2.1 for the second cone C^^^ to handle the indices i] = {j, k, m, 2). 

It remains to prove that 



V 



Again, by Proposition 2.1, 



sup J^|((cTr„7/.))r,,Ml'' < C'n SUp ^ ^ Y.^h,k,m3Arn\) 



V 



^''''''"{|i-il<2}{teijJ,J m 



< C^vSUp Yl Y.{\hk,rn,jAfh\)-''^ . (15) 

We now need to estimate ^{Ljj^k)- Recalling our 'worst-case-discussion' in the previous 
case, the number of elements in Lj j^i^ reaches its maximum if /c = 0, i.e., the Fourier 
transform of the shearlet associated with k 'sits' precisely on the x-axis. In this case, using 
the definition of Lj j^^, 

#{Lj,j,k) < (2^)~^(2J'/2 arctan(2~-''/2) _ i) _ (27r)-i(2J'/2 arctan(-2-^'/2) + i) + 3 

= (27r)~i2-'/2(arctan(2^^'/2) _ arctan(-2--'/2)) - tt^^ + 3 
< vr-i (2^/2 arctan(2-j/2) _ 1) + 3. 

Consequently, 

#{Lj,j,k) -^3, j ^ 00. 
Hence, continuing the computation in (15), 



SUP^ |((o-,,,7A*))r,,/x|^ - sup ^ (| 6j,fc,m j/,m| ) " 



■pN 



r) ^ j/,k,m 



Combining this estimate with 

^{\bj,k,m,jArn\)-'''' < CAr,p^(|m|)-f^ < C^,^, 
m m 

which can be proven similarly as (13) (cf. also [11, Sect. 5.2]), the claim (14) follows. This 
completes the proof. 
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3 Impact of Sparsity Equivalence 



To illustrate the impact of the concept of sparsity equivalence focussing on the chosen 
exemplary case of curvelets and shearlets, we now discuss two different situations in which 
the application of Theorem 1.2 automatically leads to novel results. 

We might have also included the search for optimally sparse expansions of Fourier In- 
tegral Operators of order 0. Since such a result is however already known for curvelets and 
shearlets - with not surprisingly quite similar proofs -, our considerations cannot lead to 
new results. They however point to a simplified analysis once the result was known for 
either curvelets or shearlets. 

3.1 Optimal Sparse Representation of C^-Curvilinear Singularities 

To efficiently process image data, optimally sparse approximations are crucial. As already 
discussed in Subsection 1.1, the ability to sparsely approximate a class of signals is measured 
by the decay of the error of the n-term approximation using the largest n coefficients in 
magnitude; see (1). Choosing the 'correct' model class for images is certainly a highly 
delicate task. In 2004, Candes and Donoho proposed a so-called cartoon model [9] motivated 
by the fact that edges are the most prominent features in images, a fact also evidenced in 
computer vision. 

The cartoon model they proclaimed is defined as follow: Let B C [0, 1]^ be bounded by 
a closed curve whose curvature is uniformly bounded by some u > 0, and let STAB?'{v) 
be the class of translates of such sets B. Then the class of cartoon-like images <?^(z^) is 
defined to be the set of functions / on of the form 

f = fo + flXB, 

where /o,/i G C'^(M.'^) with compact support in [0,1]^, B G STAR'^{iy), and ||/||c2 = 

Eh<2P"/IIoo<i. 

By information theoretic arguments, it can be shown that the optimally achievable rate 
of sparse approximations under weak conditions on the dictionary and the selection process 
is N~'^ as — 7- oo. For first generation curvelets [9] as well as for shearlets [16] (see also 
[22]), this rate is achieved up to a multiplicative log factor of (log A^)^. 

We now claim that also (second generation) curvelets achieve the optimal sparse ap- 
proximation rate up to a factor negligible compared to N~'^. 

Theorem 3.1 The curvelet frame (7^)^ provides (almost) optimally sparse approximations 
of functions f G <f^(i^), i-e., there exists some C > such that 

\\f - fN\\l < C ■ ■ D{N) as 00, 

where f^ is the nonlinear N-term approximation obtained by choosing the N largest curvelet 
coefficients of f e S'^iu) and iV"^ • D{N) ^ as N ^ for all e > 0. 

Proof. Given some / G similar to (2), it suffices to prove that 

Y.\{{frff.))in)\' <C-N-'-D{N) asiV^oo (16) 

n>N 
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with N^' ■ D{N) — 7- as —7- for all e > 0. We remark that in the following the constants 
might change, by abuse of notation, we however always coin them C. 

First recall that, by [16, Thm 1.1], the shearlet frame (cr^)^ achieves the rate 

sup \{{g,ar,))(n) \ < C ■ n"^/2 • {lognf/^ for each n. (17) 

Now let e > 0, and choose p = 2/3 + e. Then, by (17), 

sup ||((5,^,>J||^< sup C- V(n-3/2.(logn)3/2)2/3+.<^. 

By Theorem 1.2, this implies that sup^g^2(j,) II ((5) 7/^)^) lip ^ C. Hence, for each n, 

sup |((g,7,.))(n)| <C-n-i^ 

and therefore 

n>N 

In the definition of p the variable e can be chosen arbitrarily small, which implies (16), and 
the theorem is proved. □ 

3.2 Geometric Separation 

Natural images are typically composed of morphologically distinct features; an example 
being spines (pointlike structures) and dendrites (curvelike structures) in neurobiological 
imaging. One goal is to automatically extract those components for separate analysis. In 
[14], the author, joint with Donoho, studied the situation of images composed of point- and 
curvelike structures, for which they introduced models by 

P r 

p = ^\x - Xil"-^/"^ and C = j 5r{t)dt, with r : [0, 1] h-^ a closed curve, (18) 
i=i '' 

respectively. The Geometric Separation Problem now consists in extracting V and C from 
knowledge of / given by 

f = V+C. 

In [14], a particular decomposition technique based on £i minimization was employed which 
required suitably chosen overcomplete systems which sparsify the different components. 
Using the tight frame of radial wavelets for the pointlike structures and the tight frame 
of curvelets for the curvelike structures, asymptotically arbitrarily precise separation was 
proven. 

Using the results on sparsity equivalence derived in this paper, we can now prove that 
a different pair of representation systems can be utilized for this Geometric Separation 
Problem, which is more suitable for a digital realization: orthonormal separable Meyer 
wavelets and shearlets. In contrast to the pair considered before, surprisingly, now one 
system even forms an orthonormal basis. 
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For the reader's convenience, we first briefly recall the definition of orthonormal separa- 
ble Meyer wavelets. Let W £ denote the Fourier transform of the Meyer wavelet and 
(f) G L^(M) the associated scaling function. Letting e L'^{M?), /i = 1,2,3 be defined by 

W\0 = 4>{^i)W{^2), WH0 = W{^i)H^2) and (0 = W {^i)W {^2) , 

the orthonormal separable Meyer wavelets at scale j and spatial position n are defined by 
their Fourier transforms 

where u = {h, j, n) index type of mother function, scale, and position. This system forms 
an orthonormal basis for L^(M^). For each j, the functions il^y are supported on the corona 
■2^2J+i7r/3) where 

2. = UGM2:r<||C|U<4T} 
(see Figure 4). For more details we refer to [24]. 

t 




Figure 4: The tiling of the frequency domain induced by orthonormal separable Meyer 
wavelets. 

Shearlets where r] = {j,k,m,i) indexes scale, orientation, position, and cone, 

were already defined in Subsection 1.2.2, but to match them with Meyer wavelets, we now 
choose W to be the Fourier transform of the Meyer wavelet. We wish to draw the reader's 
attention to the fact that the supports of orthonormal separable Meyer wavelets match 
perfectly with the supports of shearlets. In fact, for each scale j, the Fourier transforms of 
the elements of both systems are supported on 22^+177/3 • 

We next construct a family of filters Fj with transfer functions 

Fj{o = wm\oo/2n, eeM' 

leading to a decomposition of a function g into functions gj = Fj-kf defined on the frequency 
corona ^2^+1^/3 equipped with the reconstruction formula g = Ylj^i 9j- -^j denote 
the range of the operator of convolution with Fj. Then shearlets at level f are orthogonal 
to J-j unless \j' — j\ < 1. Similarly, orthonormal separable Meyer wavelets at level j' are 
orthogonal to J-j unless \j' — j\ < 1. The proofs of these two claims use precisely the same 
arguments as the corresponding result in [14], wherefore we omit them. 
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We can now formulate the corresponding Component Separation Problem (CSep). For 
the sake of brevity, we let Qj denote the indices = (h,j,n) of orthonormal separable 
Meyer wavelets at level j, and let = U Qj U Qj+i- Likewise, we let denote the 
indices i] = {j, k, m, l) of shearlets at level j, and let = U T,j U '^j+i- Further, we 
denote the filtered composed image / and the filtered point and curvilinear part V and C 
(cf. (18)) by 

f^=Fj*f = Fj*{r + C)=r,+C,. 

Then we can formulate the Component Separation Problem as the following ii minimization 
problem: 

(CSep) iWj,Sj) = argmin || ((T^,-, V.)).||i + \\i{Sj,<Tr,))r,\\i subject to /, = W, + Sj. 

We claim that the considered pair of representation systems leads to asymptotically perfect 
separation in the sense of the following theorem. Before stating the result, we wish to 
remark that the proof draws from various definitions and lemmata from [14], wherefore we 
decided that for the sake of brevity - this being mostly an application of our main result in 
this paper ~ we only present the road map of its proof. 

Theorem 3.2 Let {Wj,Sj) denote the solution o/(CSep). Then, we have 

\\Wj-Vj\\2 + \\Sj-Cj\\2 



l^jl|2 + ||C,||2 



0, j oo 



Proof. The proof presented in [14] uses as one main idea the following estimate for each 
scale j: Let Sij and 52^- be sets of 'significant coefficients' of wavelets and curvelets, 
respectively, let 5j be the sparse approximation error given by 

and let {^-tc)j be the cluster coherence defined as 

{fic)j = max I max ^ \{ip^,crrj)\, max 

Then [14, Prop. 2.1] applied to each filtered fj implies 

26, 



P,||2 + ||5,-C,||2<- 



2(^c) 



Thus, the key step in [14] was the construction of clusters Sij and S2J having both of 
the following two properties: (i) asymptotically negligible cluster coherences: 

{fic)j 0, j ^ 00, 
and (ii) asymptotically negligible cluster approximation errors: 

= 0{\\Vj\\2+\\Cj\\2), j 00. 

The same steps with very similar argumentations can be performed for the pair wavelets- 
shear lets if adapted clusters Sij and S2,j are defined by applying the following two key 
observations: 
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• It was shown in Theorem 1.2 that shear lets and curvelets are sparsity equivalent; more 
precisely, there exists a sparse matrix A^i, say, which satisfies 

for any distribution g. 

• Orthonormal separable Meyer wavelets and radial wavelets are likewise sparsity equiv- 
alent, i.e., there exists a sparse matrix A''2, say, which satisfies 

{{i;,,g)), = N2{{iJx,9))x 

for any distribution g. 

A second ingredient are estimates for inner products between wavelets and shearlets 
within the frames, but also across. For this, the paralleling lemma to [14, Lem. 3.3] - with 
a very similar proof ~ is essential: 

Lemma 3.1 For each N = 1,2, . . . there is a constant cjy so that 

|('0i/,V'a)| < CAT • l{|j-_j-/|<2} • {\n - n 1)"^, \/u = {h,i,n) VA = {h',j',n'). 

As already remarked before, we will not lay out the precise details of the complete 
proof, since the arguments in the very lengthy and technical proof from [14] just need to be 
adapted in a straightforward manner to the sets of significant coefficients now based on the 
choice for orthonormal wavelets and shearlets. We then derive Theorem 3.2, thus perfect 
separation using orthonormal separable Meyer wavelets and shearlets. □ 

4 Extensions and General Viewpoint 

So far we focused entirely on a very special situation showing sparsity equivalence between 
curvelets and shearlets. Our goal was to show that for this exemplary situation sparsity 
equivalence can be established, provides insight into the relation between these systems, and 
lead automatically to novel results on sparse expansions of those two anisotropic systems. 

This is however just the 'tip of the iceberg': the main results in this paper are susceptible 
of very extensive generalizations and extensions. 

• Curvelets and Shearlets. It is conceivable that a similar statement as Theorem 1.2 is 
provable for first generation curvelets as also for the new class of compactly supported 
shearlets. It should though be mentioned that the compactly supported shearlet 
frames introduced so far are not tight frames, hence the framework developed in this 
paper needs to be extended to pairs of general frames. 

• Other Systems. The analysis of sparsity equivalence of curvelets and shearlets we drove 
here can and should be applied to other pairs of systems. Ideally, novelly introduced 
systems could be compared to a system whose sparse approximation properties are 
already very well understood. 
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• Systems with Continuous Parameters. Certainly, we can also ask about similar spar- 
sity properties for systems with continuous parameters. This however requires a dif- 
ferent sparsity model, where one conceivable path would be to compare resolution of 
wavefront set behavior in the sense of [10, 20]. 

• Weighted Norms. When aiming at transferring results such as sparse decompositions 
of curvilinear integrals [7] or sparse decompositions of the Radon transform [8], the 
framework needs to be generalized to weighted Ip norms. Also the analysis of as- 
sociated approximation spaces requires this extension, since, for instance, the norm 
associated with the curvelet spaces introduced in [3, p. 67] is precisely a weighted 
mixed ip^q norm of the coefficient sequence. 
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